Spring 2018

PREP

  1. https://github.com/dlab-geo/r-geospatial-workshop
  • Click Clone or Download and download the zip file
  • Upzip the zip file and make a note of the folder in which it is located
  1. Open RStudio and start a new script

  2. Follow along by opening r-geospatial-workshop-pt1-sp2018.html

  3. Install required libraries

install.packages(
  c("sp","rgdal","tmap","classInt","RColorBrewer",
    "ggplot2","leaflet", "ggmap"), dependencies=TRUE
)

Geospatial Data in R

Workshop Goals

  • Intro to geospatial data

  • Intro to coordinate reference systems (CRS)

  • Classes & methods for working with spatial data in R

  • Mapping geospatial data

  • Practice

About Me

About you

Who are you?

Why are you here?

Getting Started

Getting Started

  1. Open RStudio and start a new script

  2. Follow along by opening r-geospatial-workshop-pt1-sp2018.html

Getting Started

  1. In RStudio, install required libraries
install.packages(
  c("sp","rgdal","tmap","classInt","RColorBrewer",
    "ggplot2","leaflet", "ggmap"), dependencies=TRUE
)

Geographic Data

Geographic Data

are data about locations on or near the surface of the Earth.

Place names

convey geographic information but don't unambigously specify location

Geospatial data

represent location geometrically with coordinates

46.130479, -117.134167

Coordinate Reference System

Coordinates indicate specific locations on the Earth when associated with a geographic coordinate reference system or CRS.

Geographic Coordinate Reference Systems

specify

  1. the shape of the Earth as the major & minor axes of an ellipsoid
  2. the origin (0,0) - equator (~X), prime meridian (~Y)
  3. fit of the CRS to the Earth (center of the earth), aka Datum
  4. units, eg lat/lon expressed as decimal degrees or DMS

Because of variations in 1-3, there are many geographic CRSs!

WGS84

The World Geodetic System of 1984 is the default geographic CRS used today.

WGS84 is the default CRS for most GIS software

Almost all data with lon/lat coordinates are assumed to be WGS84 unless otherwise specified

[NAD83] is another common geographic CRS used by US agencies like the Census.

WGS84 and NAD83 are so similar that differences are often ignored except for applications requiring high locational accuracy.

Map Projections

A Projected CRS applies a map projection to a Geographic CRS

Map Projection: mathematial transformation from curved to flat surface

Projected CRSs

There are many, many, many projected CRSs

All introduce distortion, eg in shape, area, distance, direction

No best one for all purposes

Selection depends on location, extent and purpose

Different Projected CRSs

Spatial Data

Spatial data is a more generic term that is not just for geographic data.

Spatial data are powerful because

  • dynamically determine spatial metrics like area and length,
  • characteristics like distance and direction, and
  • relationships like inside and intersects from these data.

Types of Spatial Data

Vector Data

Points, lines and Polygons

Raster Data

Regular grid of cells (or pixels)

  • We won't be covering Raster data in this workshop*

Softare for working with Geospatial Data

  • Why special software?

Software for working with Geospatial Data

Most software can only query and analyze numbers and text

  • Databases, spreadsheets, statistical software packages

GIS - Geographic Information Systems

Software that can import, create, store, edit, visualize and analyze geospatial data

  • data types/objects to represent geospatial data
    • locations represented with coordinates
    • and referenced to the surface of the earth via CRSs
  • methods to operate on those representations

Types of GIS

Desktop GIS - ArcGIS, QGIS

Software extended to support geospatial data - Tableau, R

Spatial Databases - Postgresql

Web GIS - ArcGIS Online, CARTO

Custom software - leaflet web maps

Why R for Geospatial Data?

Why R for Geospatial Data?

You already use R

Reproducibility

Free & Open Source

Cutting edge

Thousands of Cool add-ons

  • Shiny, rleaflet

Geospatial Data File formats

Common File Formats

ESRI Shapefile

This is one of the most, if not the most common spatial vector data file formats.

Old but everywhere!

Gotchas: 2GB limit, 10 char column names, poor unicode support

CSV (Comma Separated Values) Files

The simplest, most common format for point data

"ID","name”,”x”,”y”,”taste","price","crowded","food"
"1","babette",-122.255374,37.868428,10,4,1,1
"2","musical",-122.260698,37.868383,7,3.25,1,1
"3","starbucks",-122.266057,37.870441,6,2.95,1,0
"4","yalis",-122.266385,37.873528,7,2.95,0,0
"5","berkeleyesp",-122.268681,37.873664,3,3.25,1,0
"6","fertile",-122.268863,37.874934,5,3.25,0,1

Geospatial Data in R

Geospatial Data in R

There are many approaches to and packages for working with geospatial data in R.

One approach is to keep it simple and store geospatial data in a data frame.

This approach is most common when the data are point data in CSV files.

About the Sample Data

San Francisco Open Data Portal https://data.sfgov.org

SF Property Tax Rolls

This data set includes the Office of the Assessor-Recorder’s secured property tax roll spanning from 2007 to 2016.

We are using this as a proxy for home values.

Load the CSV file into a data frame

## Warning: package 'dplyr' was built under R version 3.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
sfhomes <- read.csv('data/sf_properties.csv')  
head(sfhomes,6)

Explore the data

class(sfhomes) # what type of data object?
dim(sfhomes) # how many rows and columns
str(sfhomes) # display the structure of the object
head(sfhomes) # take a look at the first 10 records
summary(sfhomes) # explore the range of values
hist(sfhomes$totvalue)  # plot he range of values for the totvalue column

Plot of points

plot(sfhomes$lon, sfhomes$lat) # using base plot function

Nice Maps with ggplot2

library(ggplot2)

ggplot() + geom_point(data=sfhomes, aes(lon,lat), col="red", size=1)

Color points by totvalue

ggplot() + geom_point(data=sfhomes, aes(lon,lat, col=totvalue))

Subset the Data

for SalesYear equal to 2015

sfhomes15 <- subset(sfhomes, as.numeric(SalesYear) == 2015)

nrow(sfhomes15) # How many records?
## [1] 889

Always Explore

hist(sfhomes15$totvalue) # What is the distribution of totvalue?

Challenge - Make a map

of sfhomes15 using ggplot and set the color to the totvalue column.

ggplot() + geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

ggmap extends ggplot

Create basemaps on which you can display your data.

Geocode place names and addresses to get point coordinates.

and more…

ggmap

library(ggmap)

# fetch map data (default=Google) to plot
# ?get_map
sf_map <- get_map("San Francisco, CA")  
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco,+CA&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco,%20CA&sensor=false

Display the ggmap of SF

ggmap(sf_map)

Syntax similar to ggplot

ggmap(sf_map) +
  geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

Customize get_map

Create a basemap zoomed to the extent of our data

# Get the center point of the data
sf_ctr <- c(lon = mean(sfhomes15$lon), lat = mean(sfhomes15$lat))
sf_ctr  # take a look
##        lon        lat 
## -122.43209   37.75946
# create the map
sf_basemap <- get_map(sf_ctr, zoom=12, scale=1)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=37.759457,-122.432087&zoom=12&size=640x640&scale=1&maptype=terrain&language=en-EN&sensor=false

Plot the basemap with the data overlay

ggmap(sf_basemap) +
  geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

Change the basemap

See ?get_map to get available options

sf_basemap_lite <- get_map(sf_ctr, zoom=12, scale=1, 
                            maptype = "toner-lite", source="stamen")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=37.759457,-122.432087&zoom=12&size=640x640&scale=2&maptype=terrain&sensor=false
## Map from URL : http://tile.stamen.com/toner-lite/12/653/1582.png
## Map from URL : http://tile.stamen.com/toner-lite/12/654/1582.png
## Map from URL : http://tile.stamen.com/toner-lite/12/655/1582.png
## Map from URL : http://tile.stamen.com/toner-lite/12/656/1582.png
## Map from URL : http://tile.stamen.com/toner-lite/12/653/1583.png
## Map from URL : http://tile.stamen.com/toner-lite/12/654/1583.png
## Map from URL : http://tile.stamen.com/toner-lite/12/655/1583.png
## Map from URL : http://tile.stamen.com/toner-lite/12/656/1583.png
## Map from URL : http://tile.stamen.com/toner-lite/12/653/1584.png
## Map from URL : http://tile.stamen.com/toner-lite/12/654/1584.png
## Map from URL : http://tile.stamen.com/toner-lite/12/655/1584.png
## Map from URL : http://tile.stamen.com/toner-lite/12/656/1584.png

Change the basemap

ggmap(sf_basemap_lite) +
  geom_point(data=sfhomes15, aes(x=lon, y=lat, col=totvalue))

Facets / Small Multiples / Micro Maps

# Let's look at last 5 years
sfhomes2010_2015 <- subset(sfhomes, as.numeric(SalesYear) > 2009)

ggmap(sf_basemap_lite) +
  geom_point(aes(lon, lat, col=totvalue), data = sfhomes2010_15 )  +
  facet_wrap(~ SalesYear)

Facet/ Small Multiples / Micro Maps

ggmap(sf_basemap_lite) +
  geom_point(aes(lon, lat, col=totvalue), data = sfhomes2010_15 )  +
  facet_wrap(~ SalesYear)

Challenge

Redo above facet map with the following changes:

  • Use data from 1995 - 1999
  • Use a different basemap (eg maptype="terrain")

Challenge

sfhomes1995_1999 <- subset(sfhomes, (as.numeric(SalesYear) >= 1995) & (as.numeric(SalesYear) <= 1999))

ggmap(sf_basemap_lite) +
  geom_point(aes(lon, lat, col=totvalue), data = sfhomes1995_1999 )  +
  facet_wrap(~ SalesYear)

GGMap and GGPlot are great but…

BUT!

There are limits to what you can do with geospatial data stored in a dataframe

and mapping the data with ggplot and ggmap

Can't read & plot geospatial data files

Shapefile is the most common format for non-point geospatial data.

Can't directly read or plot a shapefile with data frames/ggplot/ggmap

Can't transform Coordinate Data

Spatial Analysis

  • What properties are within walking distance (.25 miles) of a BART station?

  • Average sales price by census tract?

Most spatial operations require spatial objects and methods

Spatial Data Objects in R

sp Package

The sp Package

Classes and Methods for Spatial Data

The SP package is most commonly used to construct and manipulate spatial data objects in R.

Hundreds of other R packages that do things with spatial data typically build on SP objects.

IMPORTANT ANNOUNCEMENT

sf package

The sf, or simple features package in R has many improvements

Based on open standards for specifying spatial data

ggplot and ggmap can map sf spatial objects

But most spatial packages still depend on sp

So, live on the bleeding edge or check back in a year or so.

sp package

sp package

Take a look at the different types of spatial object classes supported by sp

library(sp)
getClass("Spatial") 
## Class "Spatial" [package "sp"]
## 
## Slots:
##                               
## Name:         bbox proj4string
## Class:      matrix         CRS
## 
## Known Subclasses: 
## Class "SpatialPoints", directly
## Class "SpatialMultiPoints", directly
## Class "SpatialGrid", directly
## Class "SpatialLines", directly
## Class "SpatialPolygons", directly
## Class "SpatialPointsDataFrame", by class "SpatialPoints", distance 2
## Class "SpatialPixels", by class "SpatialPoints", distance 2
## Class "SpatialMultiPointsDataFrame", by class "SpatialMultiPoints", distance 2
## Class "SpatialGridDataFrame", by class "SpatialGrid", distance 2
## Class "SpatialLinesDataFrame", by class "SpatialLines", distance 2
## Class "SpatialPixelsDataFrame", by class "SpatialPoints", distance 3
## Class "SpatialPolygonsDataFrame", by class "SpatialPolygons", distance 2

```

sp Vector Objects

Geometry Spatial Object Spatial Object with Attributes
Points SpatialPoints SpatialPointsDataFrame
Lines SpatialLines SpatialLinesDataFrame
Polygons SpatialPolygons SpatialPolygonsDataFrame

We use the S*DF objects most frequently!

From Data Frame to SpatialPointsDataFrame

From Data Frame to SpatialPointsDataFrame

Let's transform the sfhomes15 data frame to an sp object of type SpatialPointsDataFrame

sp::coordinates()

Use the sp::coordinates() method - Sets or retrieves spatial coordinates

When transforming a DF to SPDF, requires - the object that will get the coordinates - the names of the columns that contain the X and Y coordinates

Data Frame to SPDF

# First make a copy of the data frame
sfhomes15_sp <- sfhomes15

coordinates(sfhomes15_sp) <- c('lon','lat') # ORDER MATTERS!!

class(sfhomes15_sp) # check it
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"

sp::coordinates()

You transformed a data frame to an SPDF using:

coordinates(sfhomes15_sp) <- c('lon','lat')

Now try this:

coordinates(sfhomes15_sp)

Compare the SPDF to DF

str(sfhomes15) # the data frame
## 'data.frame':    889 obs. of  19 variables:
##  $ FiscalYear      : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ SalesDate       : Factor w/ 6639 levels "1920-01-30","1920-01-31",..: 6412 6507 6508 6599 6422 6536 6398 6406 6586 6421 ...
##  $ Address         : Factor w/ 166180 levels "#200 0530-SANCHEZ             ST0000",..: 40288 30412 18494 80285 143414 89595 134736 147862 2482 80378 ...
##  $ YearBuilt       : int  1978 1950 2004 1951 2015 2008 1983 1951 1963 1925 ...
##  $ NumBedrooms     : int  3 2 1 0 2 1 1 0 1 2 ...
##  $ NumBathrooms    : int  3 2 1 0 2 1 1 0 1 2 ...
##  $ NumRooms        : int  7 5 0 4 5 3 3 0 3 5 ...
##  $ NumStories      : int  2 1 0 1 1 0 1 0 0 3 ...
##  $ NumUnits        : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ AreaSquareFeet  : int  2739 1147 740 836 1272 745 753 1434 775 1264 ...
##  $ ImprovementValue: int  411176 396556 446202 135000 817276 410000 85235 187312 412500 483005 ...
##  $ LandValue       : int  959410 925298 446202 315000 817276 410000 127852 561940 412500 483005 ...
##  $ Neighborhood    : Factor w/ 41 levels "","Bayview Hunters Point",..: 13 36 7 29 20 35 20 25 5 27 ...
##  $ Location        : Factor w/ 124331 levels "(37.7081643340418, -122.451068309854)",..: 73331 28165 106031 13630 90593 102763 88072 6855 115848 93189 ...
##  $ SupeDistrict    : int  7 7 6 9 8 6 8 11 3 1 ...
##  $ totvalue        : int  1370586 1321854 892404 450000 1634552 820000 213087 749252 825000 966010 ...
##  $ SalesYear       : int  2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##  $ lat             : num  37.8 37.7 37.8 37.7 37.8 ...
##  $ lon             : num  -122 -122 -122 -122 -122 ...

SPDF

str(sfhomes15_sp) # the SPDF
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
##   ..@ data       :'data.frame':  889 obs. of  17 variables:
##   .. ..$ FiscalYear      : int [1:889] 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##   .. ..$ SalesDate       : Factor w/ 6639 levels "1920-01-30","1920-01-31",..: 6412 6507 6508 6599 6422 6536 6398 6406 6586 6421 ...
##   .. ..$ Address         : Factor w/ 166180 levels "#200 0530-SANCHEZ             ST0000",..: 40288 30412 18494 80285 143414 89595 134736 147862 2482 80378 ...
##   .. ..$ YearBuilt       : int [1:889] 1978 1950 2004 1951 2015 2008 1983 1951 1963 1925 ...
##   .. ..$ NumBedrooms     : int [1:889] 3 2 1 0 2 1 1 0 1 2 ...
##   .. ..$ NumBathrooms    : int [1:889] 3 2 1 0 2 1 1 0 1 2 ...
##   .. ..$ NumRooms        : int [1:889] 7 5 0 4 5 3 3 0 3 5 ...
##   .. ..$ NumStories      : int [1:889] 2 1 0 1 1 0 1 0 0 3 ...
##   .. ..$ NumUnits        : int [1:889] 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..$ AreaSquareFeet  : int [1:889] 2739 1147 740 836 1272 745 753 1434 775 1264 ...
##   .. ..$ ImprovementValue: int [1:889] 411176 396556 446202 135000 817276 410000 85235 187312 412500 483005 ...
##   .. ..$ LandValue       : int [1:889] 959410 925298 446202 315000 817276 410000 127852 561940 412500 483005 ...
##   .. ..$ Neighborhood    : Factor w/ 41 levels "","Bayview Hunters Point",..: 13 36 7 29 20 35 20 25 5 27 ...
##   .. ..$ Location        : Factor w/ 124331 levels "(37.7081643340418, -122.451068309854)",..: 73331 28165 106031 13630 90593 102763 88072 6855 115848 93189 ...
##   .. ..$ SupeDistrict    : int [1:889] 7 7 6 9 8 6 8 11 3 1 ...
##   .. ..$ totvalue        : int [1:889] 1370586 1321854 892404 450000 1634552 820000 213087 749252 825000 966010 ...
##   .. ..$ SalesYear       : int [1:889] 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
##   ..@ coords.nrs : int [1:2] 19 18
##   ..@ coords     : num [1:889, 1:2] -122 -122 -122 -122 -122 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:889] "151677" "150126" "125377" "65555" ...
##   .. .. ..$ : chr [1:2] "lon" "lat"
##   ..@ bbox       : num [1:2, 1:2] -122.5 37.7 -122.4 37.8
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:2] "lon" "lat"
##   .. .. ..$ : chr [1:2] "min" "max"
##   ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
##   .. .. ..@ projargs: chr NA

SPDF Objects

SPDF Objects

You can see from str(sfhomes) that a SPDF object is a collection of slots or components. The key ones are:

  • @data data frame of attributes that describe each location
  • @coords the coordinates for each geometric object - here points
  • @bbox the min and max bounding coordinates
  • @proj4string the coordinate reference system defintion as a string

Explore the object in the Environment window

SPDF Slots

Review the output of each of these:

summary(sfhomes15_sp)
head(sfhomes15_sp@data)
class(sfhomes15_sp@data)

sfhomes15_sp@bbox
bbox(sfhomes15_sp)

head(sfhomes15_sp@coords)
head(sfhomes15_sp$lat)
head(sfhomes15_sp$lon)

sfhomes15_sp@proj4string
proj4strings(sfhomes15_sp)

Take a closer look

Look at sfhomes15_sp in the environment window

What's missing

Are all the columns that were present in sfhomes15 also in sfhomes15_sp?

Is there a slot in sfhomes15_sp without data?

What is the CRS of the data?

proj4string(sfhomes15_sp) # get a CRS object
## [1] NA

Map an SpatialPointsDataFrame

plot(sfhomes15_sp)  # using sp::plot

So…..?

Recap

We created great maps of sfhomes point data with ggplot and ggmap.

Then we created a simple map of the SPDF with plot.

We aren't seeing the value of sp objects just yet.

Let's add more geospatial data

Reading in Geospatial Data

There's an R package for that!

rgdal

rgdal is an R port of the powerful and widely used GDAL library.

It is the most commonly used R library for importing and exporting spatial data.

  • OGR: for vector data: readOGR() and writeOGR()

  • GDAL for raster data: readGDAL() and writeGDAL()

rgdal

library(rgdal)
## Warning: package 'rgdal' was built under R version 3.4.3
## rgdal: version: 1.2-16, (SVN revision 701)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.4/Resources/library/rgdal/proj
##  Linking to sp version: 1.2-5
# See what file types are supported by rgdal drivers
# ogrDrivers()$name

Getting help

gdal.org

`?readOGR

For more info on working with rgdal to load different types of spatial data in R see this excellent tutorial by Zev Ross.

Read in a Shapefile with the boundary of San Francisco

Take a look at the file(s)

dir("data", pattern="sf_boundary")
## [1] "sf_boundary.dbf" "sf_boundary.prj" "sf_boundary.shp" "sf_boundary.shx"

Read in Shapefile

sfboundary <- readOGR(dsn="data",layer="sf_boundary")
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "sf_boundary"
## with 1 features
## It has 3 fields
# or
# sfboundary <- readOGR("data","sf_boundary")
# but not
#sfboundary <- readOGR(dsn="data/",layer="sf_boundary")

Check out the data structure

What type of sp object is sfboundary? How many features are in the object?

class(sfboundary)
str(sfboundary) 
head(sfboundary@data)  

Explore the object in the Envi window

Make a quick plot of sfboundary

How?

Make a quick plot of sfboundary

plot(sfboundary)

Demonstration - A more complex SpatialPolygonsDataFrame

library(tigris)
calcounties <- counties(state="California", cb=T)
class(calcounties)
sf_cen <- subset(calcounties, COUNTYFP == "075")
plot(sf_cen)
head(sf_cen@data)
# Now look in Envi Window

What is the the sfboundary CRS

Is it a geographic or projected CRS?

proj4string(sfboundary)

Map both sfboundary & sfhomes15_sp

plot(sfboundary)
points(sfhomes15_sp, col="red")

Map both sfboundary & sfhomes15_sp

Where are the points? What's wrong?

plot(sfboundary)
points(sfhomes15_sp, col="red")

What's Wrong?

Compare the CRSs, are they the same?

proj4string(sfboundary)
proj4string(sfhomes15_sp)
proj4string(sfboundary) == proj4string(sfhomes15_sp)

Compare the CRSs, are they the same?

proj4string(sfboundary)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(sfhomes15_sp)
## [1] NA
proj4string(sfboundary) == proj4string(sfhomes15_sp)
## [1] NA

Compare the coordinate values

sfboundary@bbox
##         min       max
## x  542696.6  556659.9
## y 4173563.7 4185088.6
sfhomes15_sp@bbox
##            min        max
## lon -122.51065 -122.36964
## lat   37.70821   37.80586

CRS Problems

The #1 reason…

CRS Definitions

All sp objects should have a defined CRS

If not, one needs to be assigned to the object - This is called defining a projection. - This doesn't change the coordinates.

CRS Transformations

All sp objects should have the same CRS.

  • When they don't, they need to be transformed to a common CRS.

  • This is also called a projection transformation,
    • or projecting or reprojection.
  • Projection transformation returns a new spatial object with the transformed coordinates

CRS Definitions and Transformations

Geospatial-aware software will have a database of definitions for thousands of Earth referenced coordinate systems

Defining a CRS

We need to define the CRS of what sp object? - sfboundary or sfhomes15_sp?

We need to know - the appropriate CRS for the data - how to define the CRS - how to assign it to the sp object

Define a CRS

sp includes the CRS() function to define a CRS

  • Two ways
# use an EPSG code
proj4string(sfhomes15_sp) <- CRS("+init=epsg:4326") 

# or enter the proj4 string
# proj4string(sfhomes15_sp) <- CRS("+proj=longlat 
#                               +ellps=WGS84 +datum=WGS84 +no_defs")  

check it

proj4string(sfhomes15_sp)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

Notes

  • proj4string() can get or set the CRS
  • proj4 is a library for managing map projections and CRSs
  • epsg codes are used as short-hand for CRSs definitions

Compare the CRSs, AGAIN

proj4string(sfboundary)
## [1] "+proj=utm +zone=10 +datum=NAD83 +units=m +no_defs +ellps=GRS80 +towgs84=0,0,0"
proj4string(sfhomes15_sp)
## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
proj4string(sfboundary) == proj4string(sfhomes15_sp)
## [1] FALSE

Transforming a CRS

Transform the CRS

Use sp function spTransform

Requires as input:

  • a sp object to transform with a defined CRS

  • a target CRS

Outputs a new spatial object with coordinate data in the target CRS

  • It does not change the input data

Transform sfboundary

from UTM10 CRS to WGS84 (why?)

sfboundary_lonlat <- spTransform(sfboundary, CRS("+init=epsg:4326"))

# or
# sfboundary_lonlat <- spTransform(sfboundary, 
#                              CRS(proj4string(sfhomes15_sp)))

How do these two spTransform approaches differ?

Did it work?

How will we know?

Do the CRSs match?

proj4string(sfhomes15_sp) == proj4string(sfboundary_lonlat)

Overlay the data in space

plot(sfboundary_lonlat)
points(sfhomes15_sp, col="red")
points(sfhomes15_sp[sfhomes15_sp$totvalue<1000000,], col="green")

Overlay the data in space

Woo-hoo!

So….

We can transform sp objects to the same CRS for mapping and spatial analysis (we'll get there!)

Save sfboundary_lonlat

Use writeOGR to save sfboundary_lonlat to a new shapefile

See ?writeOGR for help

# write transformed data to a new shapefile 
writeOGR(sfboundary_lonlat, 
          dsn = "data", 
          layer = "sfbounary_lonlat", 
          driver="ESRI Shapefile")

# is it there?
dir("data")

Projections, CRS, oh my!

We want all data in the same CRS

Which one is best?

Finding CRS Codes

Common CRS Codes

Geographic CRSs

  • 4326 Geographic, WGS84 (default for lon/lat)

  • 4269 Geographic, NAD83 (USA Fed agencies like Census)

Projected CRSs

  • 5070 USA Contiguous Albers Equal Area Conic

  • 3310 CA ALbers Equal Area

  • 26910 UTM Zone 10, NAD83 (Northern Cal)

  • 3857 Web Mercator (web maps)

QUESTIONS?

Break…..

Mapping Spatial Objects

So far we have created maps with

base::plot, ggplot, ggmap for geospatial data in data frames

  • great for creating maps given these types of data

sp::plot for sp objects

  • meh, but great for a quick look at spatial data

There is also sp::spplot

spplot

# map of the sfhomes data by totalvaue
spplot(sfhomes15_sp,"totvalue")

tmap

tmap

tmap stands for thematic map

Great maps with less code than the alternatives

Syntax should be familar to ggplot2 users, but simpler

Relatively easy to create interactive maps

tmap starting points

tmap

Load the library

library(tmap)
## Warning: package 'tmap' was built under R version 3.4.3

Quick tmap (qtm)

qtm(sfhomes15_sp)

Quick Interactive tmap

tmap_mode("view")
## tmap mode set to interactive viewing
qtm(sfhomes15_sp)

Reset the mode to static plot

tmap_mode("plot")
## tmap mode set to plotting

qtm of totvalue

NOTE: columnn names must be quoted!

qtm(sfhomes15_sp, symbol.col="totvalue")

qtm of totvalue

Challenge

Create a static qtm of sfboundary_lonlat

  • bonus, set the border color to black and the fill to beige

See ?qtm for description of all options

sfboundary_lonlat quickmap

qtm(sfboundary_lonlat, borders="black", fill="beige")

Crafting complex tmaps

tmap Shapes and Graphic Elements

tmap's flexibility comes in how it intitively allows you to layer spatial data and style the layers by data attributes

Use tm_shape(<sp_object>) to specifiy a geospatial data layer

Add + tm_<element>(...) to style the layer by data values

…and other options for creating a publication ready map

Exploring tmap functionality

?tmap_shape

?tmap_element

"Slow" tmaps

tm_shape(sfboundary_lonlat) + tm_polygons(col="beige", border.col="black")

tmap of totvalue

tm_shape(sfhomes15_sp) + tm_dots(col="totvalue", size=.5)

Challenge

Make the previous map interactive

Zoom in on light & dark colored points to see if the corressponding totvalue makes sense.

Reset the mode to static

tmap_mode("plot")
## tmap mode set to plotting

Overlaying layers

tm_shape(sfboundary_lonlat) + tm_polygons(col="black", border.col="grey") + 
   tm_shape(sfhomes15_sp) + tm_dots(col="totvalue", size=.5)

Challenge

Redo the last map but

change the legend title to "San Francisco Home Sales, 2015"

  • hint: title=

tmap and CRSs

tmap and CRSs

We have been mapping data in WGS84 (lon/lat) CRS.

Not a great idea for static maps of larger areas as distortion becomes evident.

Let's explore that distortion and how to address with tmap

US States

Let's load and map data for US states.

Data are in the file data/us_states_pop.shp

Challenge

Use readOGR to load data/us_states_pop.shp into an sp object named us_states

Read in & plot the Data

us_states <- readOGR("./data", "us_states_pop")
## OGR data source with driver: ESRI Shapefile 
## Source: "./data", layer: "us_states_pop"
## with 49 features
## It has 4 fields

Take a quick look

qtm(us_states)

Questions

Review us_states with the sp commands we used earlier and / or explore in the Environment window.

  • What type of sp object is us_states

  • How many features does it contain?

  • How many attributes describe those features?

  • What is the CRS?

Customizing the display

tm_shape(us_states) + tm_polygons(col="grey", border.col = "white")

CRS and Shape

Notice anything odd about shape of USA?

Dynamically Transforming the CRS

tm_shape(us_states, projection="+init=epsg:5070") + tm_polygons(col="grey", border.col = "white")

What's happening here?

tm_shape(us_states, projection="+init=epsg:5070") + tm_polygons(col="grey", border.col = "white") +
tm_shape(us_states) + tm_borders(col="purple") 

Dynamic CRS Transformations

Also called On-the-fly reprojection in ArcGIS & QGIS

Very cool!

BUT, if you want to use data in a different CRS it is best to transform it.

Challenge

  • Transform the us_states data to the USA Contiguous Albers CRS (5070),

  • Save output as a new SpatialPolygonsDataFrame called us_states_5070

us_states_5070

us_states_5070 <- spTransform(us_states, CRS("+init=epsg:5070"))

Plotting the Transformed Data

tm_shape(us_states_5070) + tm_polygons(col="beige") +
  tm_shape(us_states) + tm_borders(col="purple")

Questions?

Recap

  • Geospatial Data in R
  • CSV > Data Frame > ggplot/ggmap
  • GDAL, readOGR, writeOGR
  • Data Frame to SDPF
  • CRS definitions and transformations
  • Overlays
  • tmap